Graph-based Analysis of Semantic Drift in Espresso-like Bootstrapping Algorithms

نویسندگان

  • Mamoru Komachi
  • Taku Kudo
  • Masashi Shimbo
  • Yuji Matsumoto
چکیده

Bootstrapping has a tendency, called semantic drift, to select instances unrelated to the seed instances as the iteration proceeds. We demonstrate the semantic drift of bootstrapping has the same root as the topic drift of Kleinberg’s HITS, using a simplified graphbased reformulation of bootstrapping. We confirm that two graph-based algorithms, the von Neumann kernels and the regularized Laplacian, can reduce semantic drift in the task of word sense disambiguation (WSD) on Senseval-3 English Lexical Sample Task. Proposed algorithms achieve superior performance to Espresso and previous graph-based WSD methods, even though the proposed algorithms have less parameters and are easy to calibrate.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Bootstrapping Algorithm for Automatically Harvesting Semantic Relations

In this paper, we present Espresso, a weakly-supervised iterative algorithm combined with a web-based knowledge expansion technique, for extracting binary semantic relations. Given a small set of seed instances for a particular relation, the system learns lexical patterns, applies them to extract new instances, and then uses the Web to filter and expand the instances. Preliminary experiments sh...

متن کامل

HITS-based Seed Selection and Stop List Construction for Bootstrapping

In bootstrapping (seed set expansion), selecting good seeds and creating stop lists are two effective ways to reduce semantic drift, but these methods generally need human supervision. In this paper, we propose a graphbased approach to helping editors choose effective seeds and stop list instances, applicable to Pantel and Pennacchiotti’s Espresso bootstrapping algorithm. The idea is to select ...

متن کامل

Graph-Based Seed Set Expansion for Relation Extraction Using Random Walk Hitting Times

Iterative bootstrapping methods are widely employed for relation extraction, especially because they require only a small amount of human supervision. Unfortunately, a phenomenon known as semantic drift can affect the accuracy of iterative bootstrapping and lead to poor extractions. This paper proposes an alternative bootstrapping method, which ranks relation tuples by measuring their distance ...

متن کامل

Design and implementation of Persian spelling detection and correction system based on Semantic

Persian Language has a special feature (grapheme, homophone, and multi-shape clinging characters) in electronic devices. Furthermore, design and implementation of NLP tools for Persian are more challenging than other languages (e.g. English or German). Spelling tools are used widely for editing user texts like emails and text in editors.  Also developing Persian tools will provide Persian progr...

متن کامل

Learning Semantic Lexicons using Graph Mutual Reinforcement based Bootstrapping

Bootstrapping has been received a amount of attentions in many fields and achieved good results. While semantic lexicons also have been proved to be useful for many natural language processing tasks. This paper presents an approach to learn semantic lexicons using a new bootstrapping method which is based on Graph Mutual Reinforcement. The approach uses only unlabeled data and a few of seed wor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008